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Abstract 

Grammar-based compression, where one replaces a long string by a small context-free grammar 
that generates the string, is a simple and powerful paradigm that captures many popular compression 
schemes. Given a grammar, the random access problem is to compactly represent the grammar while 
supporting random access, that is, given a position in the original uncompressed string report the 
character at that position. In this paper we study the random access problem with the finger search 
property, that is, the time for a random access query should depend on the distance between a 
specified index /, called the finger, and the query index i. We consider both a static variant, where 
we first place a hnger and subsequently access indices near the finger efficiently, and a dynamic 
variant where also moving the finger such that the time depends on the distance moved is supported. 

Let n be the size the grammar, and let N be the size of the string. For the static variant we 
give a linear space representation that supports placing the finger in 0{\ogN) time and subsequently 
accessing in 0(log_D) time, where D is the distance between the hnger and the accessed index. 
For the dynamic variant we give a linear space representation that supports placing the hnger in 
0(log A) time and accessing and moving the hnger in 0(logD + log log A) time. Compared to the 
best linear space solution to random access, we improve a 0(log A) query bound to O(log-D) for the 
static variant and to 0(log_D + log log A) for the dynamic variant, while maintaining linear space. 
As an application of our results we obtain an improved solution to the longest common extension 
problem in grammar compressed strings. To obtain our results, we introduce several new techniques 
of independent interest, including a novel van Emde Boas style decomposition of grammars. 


1 Introduction 


Grammar-based compression, where one replaces a long string by a small context-free grammar that 
generates the string, is a simple and powerful paradigm that captures many popular compression schemes 
including the Lempel-Ziv family [47p9p0 , Sequitur 36 , Run-Length Encoding, Re-Pair 33 , and many 
more [2}|4|[^[^[30|[^[4l][48] . All of these are or can be transformed into equivalent grammar-based 


compression schemes with little expansion 14 39 


Given a grammar S representing a string S, the random access problem is to compactly represent 
S while supporting fast access queries, that is, given an index i in 5" to report S'[i]. The random 
access problem is one of the most basic primitives for computation on grammar compressed strings, and 
solutions to the problem are a key component in a wide range of algorithms and data structures for 
grammar compressed strings [5l[8} |Tol[2L[p^|28[|43[|44| . 

In this paper we study the random access problem with the finger search property, that is, the time 
for a random access query should depend on the distance between a specified index f, called the finger, 
and the query index i. We consider two variants of the problem. The first variant is static finger search, 
where we can place a finger with a setfinger operation and subsequently access positions near the finger 
efficiently. The finger can only be moved by a new setfinger operation, and the time for setfinger is 
independent of the distance to the previous position of the finger. The second variant is dynamic finger 
search, where we also support a movefinger operation that updates the finger such that the update time 
depends on the distance the finger is moved. 

Our main result is efficient solutions to both finger search problems. To state the bounds, let n be the 
size the grammar S, and let N be the size of the string S. For the static finger search problem, we give an 
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0{n) space representation that supports setfinger in 0{logN) time and access in 0{\ogD) time, where 
D is the distance between the finger and the accessed index. For the dynamic finger search problem, we 
give an Oin) space representation that supports setfinger in 0{\ogN) time and movefinger and access in 
0{logD + log log A^) time. The best linear space solution for the random access problem uses 0{\ogN) 
time for access. Hence, compared to our result we improve the 0(log N) bound to 0(log D) for the static 
version and to 0{logD + log log fV) for the dynamic version, while maintaining linear space. These are 
the first non-trivial bounds for the finger search problems. 

As an application of our results we also give a new solution to the longest common extension problem 
on grammar compressed strings [9,28 ,^. Here, the goal is to compactly represent S while supporting 
fast Ice queries, that is, given a pair of indices z, j to compute the length of the longest common prefix 
of 5'[z,Af] and «S'[j,Af]. We give an 0{n) space representation that answers queries in 0(logA^ + log^f'), 
where i is the length of the longest common prefix. The best 0{n) space solution for this problem uses 
0(log A^logf) time, and hence our new bound is always at least as good and better whenever i = o{N’^). 


1.1 Related Work 

We briefly review the related work on the random access problem and finger search. 


Random Access in Grammar Compressed Strings First note that naively we can store S ex¬ 
plicitly using 0{N) space and report any character in constant time. Alternatively, we can compute 
and store the sizes of the strings derived by each grammar symbol in S and use this to simulate a 
top-down search on the grammars derivation tree in constant time per node. This leads to an 0(ri) 
space representation using 0{h) time, where h is the height of the grammar 25 . Improved succinct 
space representation of this solution are also known Bille et al. 10 gave a solution using 0{n) 

and 0( log N) time, thus achieving a query time independent of the height of the grammar. Verbin and 
Yu 46 gave a near matching lower bound by showing that any solution using 0(n log*^^^^ N) space must 
use H(log^“'^ N) time. Hence, we cannot hope to obtain significantly faster query times within 0{n) 
space. Finally, Belazzougui et al. very recently showed that with superlinear space slightly faster 
query times are possible. Specifically, they gave a solution using O {nr log^ N/n) space and 0{log^ N) 
time, where r is a trade-off parameter. For r = log*^ N this is 0(n log*^ N) space and 0(log A/loglog A) 
time. Practical solutions to this problem have been considered in [6 

The above solutions all generalize to support decompression of an arbitrary substring of length D 
in time 0(taccess + T)), where taccess is the time for access (and even faster for small alphabets [^). We 
can extend this to a simple solution to finger search (static and dynamic). The key idea is to implement 
setfinger as a random access and access and movefinger by decompressing or traversing, respectively, the 
part of the grammar in-between the two positions. This leads to a solution that uses O(taccess) time for 
setfinger and 0{D) time for access and movefinger. 

Another closely related problem is the hookmarking problem, where a set of positions, called book¬ 
marks, are given at preprocessing time and the goal is to support fast substring decompression from 
any bookmark in constant or near-constant time per decompressed character [16,21 . In other words, 


bookmarking allows us to decompress a substring of length D in time 0{D) if the substring crosses 
a bookmark. Hence, with bookmarking we can improve the 0(taccess + D) time solution for substring 
decompression to 0{D) whenever we know the positions of the substrings we want to decompress at 
preprocessing time. A key component in the current solutions to bookmarking is to trade-off the n{D) 
time we need to pay to decompress and output the substring. Our goal is to support access without 
decompressing in o{D) time and hence this idea does not immediately apply to finger search. 


Finger Search Finger search is a classic and well-studied concept in data structures, see e.g., mmi 


17|19|27|32|34||38|40|42 and the survey |12j . In this setting, the goal is to maintain a dynamic dictionary 


data structure such that searches have the finger search property. Classic textbook examples of efficient 
finger search dictionaries include splay trees, skip lists, and level linked trees. Given a comparison based 
dictionary with n elements, we can support optimal searching in O(logn) time and finger searching in 
O(logd) time, where d is the rank distance between the finger and the query [^. Note the similarity to 
our compressed results that reduce an 0{\ogN) bound to 0{\ogD). 
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1.2 Our results 


We now formally state our results. Let S' be a string of length N compressed into a grammar S of length 
n. Our goal is to support the following operations on S. 

access(i): return the character S[i] 

setfinger(/): set the finger at position / in S. 

movefinger(/): move the finger to position / in S. 

The static finger problem is to support access and setfinger, and the dynamic finger search problem is to 
support all three operations. We obtain the following bounds for the finger search problems. 

Theorem 1 Let S be a grammar of size n representing a string S of length N. Let f be the current 
position of the finger, and let D = \ f — i\ for some i. Using 0{n) space we can support either: 

(i) setfinger(/) in 0(\ogN) time and access(i) in OflogD) time. 

(ii) setfinger(/) inOilogN) time, movefinger(i) and accessii) both in O {log D + \og\ogN) time. 

Compared to the previous best linear space solution, we improve the O(logA^) bound to OflogD) for 
the static variant and to 0(\ogD + log log A^) for the dynamic variant, while maintaining linear space. 
These are the first non-trivial solutions to the finger search problems. Moreover, the logarithmic bound 
in terms of D may be viewed as a natural grammar compressed analogue of the classic uncompressed 
finger search solutions. We note that Theorem is straightforward to generalize to multiple fingers. 
Each additional finger can be set in O(logfV) time, uses OflogN) additional space, and given any finger 
/, we can support access(i) in OflogDf) time, where Df = \ f — i\. 


1.3 Technical Overview 


To obtain Theorem we introduce several new techniques of independent interest. First, we consider 
a variant of the random access problem, which we call the fringe access problem. Here, the goal is to 
support fast access close to the beginning or end (the fringe) of a substring derived by a grammar symbol. 
We present an 0{n) space representation that supports fringe access from any grammar symbol v in time 
OilogDy + log log iV), where is the distance from the fringe in the string S{v) derived by v to the 
queried position. The key challenge is designing a data structure for efficient navigation in unbalanced 
grammars. 

The main component in our solution to this problem is a new recursive decomposition. The decompo¬ 


sition resembles the classic van Emde Boas data structure 45 , in the sense that we recursively partition 


the grammar into a hierarchy of depth O (log log A^) consisting of subgrammars generating strings of 
lengths .... We then show how to implement fringe access via predecessor queries on 

special paths produced by the decomposition. We cannot afford to explicitly store a predecessor data 
structure for each special path, however, using a technique due to Bille et al. [10| , we can represent 
all the special paths compactly in a tree and instead implement the predecessor queries as weighted 
ancestor queries on the tree. This leads to an 0{n) space solution with 0{\ogDy (log log A^)^) query 
time. Whenever Dy > 2 (*°s*°sV) j^^tches our desired bound of 0(logD„ -I- loglog A^). To handle 
the case when Dy < we use an additional decomposition of the grammar and further reduce 

the problem to weighted ancestor queries on trees of small weighted height. Finally, we give an efficient 
solution to weighted ancestor for this specialized case that leads to our final result for fringe access. 

Next, we use our fringe access result to obtain our solution to the static finger search problem. The 
key idea is to decompose the grammar into heavy paths as done by Bille et al. which has the 
property that any root-to-leaf path in the directed acyclic graph representing the grammar consists of 
at most OflogN) heavy paths. We then use this to compactly represent the finger as a sequence of the 
heavy paths. To implement access, we binary search the heavy paths in the finger to find an exit point 
on the finger, which we then use to find an appropriate node to apply our solution to fringe access on. 
Together with a few additional tricks this gives us Theorem [^i) . 

Unfortunately, the above approach for the static finger search problem does not extend to the dynamic 
setting. The key issue is that even a tiny local change in the position of the finger can change 0(log A^) 
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heavy paths in the representation of the finger, hence requiring at least Q{\ogN) work to implement 
movefinger. To avoid this we give a new compact representation of the finger based on both heavy path 
and the special paths obtained from our van Emde Boas decomposition used in our fringe access data 
structure. We show how to efficiently maintain this representation during local changes of the finger, 
ultimately leading to Theorem Bii)- 

1.4 Longest Common Extensions 

As application of Theorem we give an improved solution to longest common extension problem in 
grammar compressed strings. The first solution to this problem is due to Bille et al. [^. They showed 
how to extend random access queries to compute Karp-Rabin fingerprints. Combined with an exponential 
search this leads to a linear space solution to the longest common extension problem using 0(logiVlog£) 
time, where £ is the length of the longest common extension. We note that we can plug in any of the 
above mentioned random access solution. More recently, Nishimoto et al. used a completely different 
approach to get 0(logfV + log £ log* N) query time while using superlinear 0(nlog A^log* N) space. We 
obtain: 

Theorem 2 Let S be a grammar of size n representing a string S of length N. We can solve the longest 
common extension problem in OilogN + log^£) time and 0{n) space where £ is the length of the longest 
common extension. 

Note that we need to verify the Karp-Rabin fingerprints during preprocessing in order to obtain a 
worst-case query time. Using the result from Bille et al. [10| this gives a randomized expected prepro¬ 
cessing time of 0{N log N). 

Theorem improves the 0(logN\og£) solution to 0{logN + log^£). The new bound is always at 
least as good and asymptotically better whenever £ = o{N’^) where e is a constant. The new result follows 
by extending Theorem to compute Karp-Rabin fingerprints and use these to perform the exponential 
search from [^. 


2 Preliminaries 

Strings and Trees Let S = iS'[l, IIS'!] be a string of length [S'!. Denote by S'[j] the character in S at 
index i and let j] be the substring of S of length j — i + 1 from index f > 1 to IS”! > j > i, both 
indices included. 

Given a rooted tree T, we denote by T{v) the subtree rooted in a node v and the left and right child 
of a node v by left{v) and right(v) if the tree is binary. The nearest common ancestor nca(ri,M) of two 
nodes v and u is the deepest node that is an ancestor of both v and u. A weighted tree has weights on 
its edges. A weighted ancestor query for node v and weight d returns the highest node w such that the 
sum of weights on the path from the root to w is at least d. 

Grammars and Straight Line Programs Grammar-based compression replaces a long string by a 
small context-free grammar (GFG). We assume without loss of generality that the grammars are in fact 
straight-line programs (SLPs). The lefthand side of a grammar rule in an SLP has exactly one variable, 
and the forighthand side has either exactly two variables or one terminal symbol. In addition, SLPs are 
unambigous and acyclic. We view SLPs as a directed acyclic graph (DAG) where each rule correspond 
to a node with outgoing ordered edges to its variables. Let S be an SLP. As with trees, we denote the 
left and right child of an internal node v by left{v) and right(v). The unique string S(y) of length 
is produced by a depth-first left-to-right traversal of u in 5 and consist of the characters on the leafs in 
the order they are visited. The corresponding parse tree for v is denoted T{v). We will use the following 
results, that provides efficient random access from any node v in S. 

Lemma 1 ( [10] ) Let S be a string of length N compressed into a SLP S of size n. Given a node v G S, 
we can support random access in S{v) in OilogNy) time, and at the same time reporting the sequence 
of heavy paths and their entry- and exit points in the corresponding depth-first traversal ofS{v). The 
number of heavy paths visited is 0(logNy). 
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Karp-Rabin Fingerprints For a prime p, < P < and x G [p] the Karp-Rabin finger¬ 

print 29 , denoted (j){S[i,j]), of the substring S'[z, j] is defined as (j){S[i,j]) = J2i<k<j S[k]x^ *modp. 


The key property is that for a random choice of x, two substrings of S match iff their fingerprints 
match (whp.), thus allowing us to compare substrings in constant time. We use the following well-known 
properties of fingerprints. 

Lemma 2 The Karp-Rabin fingerprints have the following properties: 


1) Given (j){S[i,j]), the fingerprint (l){S[i,j ± a]) for some integer a, can he computed in 0{a) time. 

2) Given fingerprints and (j){S[l,j]), the fingerprint (p{S[i,j]) can be computed in 0(1) time. 

3) Given fingerprints and (j){S 2 ), the fingerprint 4>{Si ■ S 2 ) = fiiSi) 0 (j){S 2 ) can be computed in 

0(1) time. 


3 Fringe Access 


In this section we consider the fringe access problem. Here the goal is to compactly represent the SLP, 
such that for any node v, we can efficiently access locations in the string S{v) close to the start or the end 
of the substring. The fringe access problem is the key component in our finger search data structures. 
A straightforward solution to the fringe access problem is to apply a solution to the random access 
problem. For instance if we apply the random access solution from Bille et al. 10 stated in Lemma[^we 
immediately obtain a linear space solution with 0(logiV„) access time, i.e., the access time is independent 
of the distance to the start or the end of the string. This is an immediate consequence of the central 
grammar decomposition technique of 10 , and does not extend to solve fringe access efficiently. Our 


main contribution in this section is a new approach that bypasses this obstacle. We show the following 
result. 


Lemma 3 Let S he an SLP of size n representing a string of length N. Using 0(n) space, we can 
support access to position i of any node v, in time 0(log(min(i, Ny — i)) 0 log log A^). 

The key idea in this result is a van Emde Boas style decomposition of S combined with a predecessor 
data structure on selected paths in the decomposition. To achieve linear space we reduce the pre¬ 
decessor queries on these paths to a weighted ancestor query. We first give a data structure with 
query time 0((loglogiV)^ 0 log(min(z, — i))). We then show how to reduce the query time to 
0(loglogA 0 log(min(j, — *))) by reducing the query time for small i. To do so we introduce an 
additional decomposition and give a new data structure that supports fast weighted ancestor queries on 
trees of small weighted height. 

For simplicity and without loss of generality we assume that the access point i is closest to the start of 
S(v), i.e., the goal is to obtain 0(log(i) 0 log log A) time. By symmetry we can obtain the corresponding 
result for access points close to the end of S(v). 


3.1 van Emde Boas Decomposition for Grammars 

We first define the vEB decomposition on the parse tree T and then extend it to the SLP S. In the 
decomposition we use the ART decompostion by Alstrup et al. [^. 

ART Decomposition The ART decomposition introduced by Alstrup et al. decomposes a tree 
into a single top tree and a number of bottom trees. Each bottom tree is a subtree rooted in a node of 
minimal depth such that the subtree contains no more than x leaves and the top tree is all nodes not in 
a bottom tree. The decomposition has the following key property. 

Lemma 4 (0) The ART decomposition with parameter x for a rooted tree T with A leaves produces a 
top tree with at most leaves. 

We are now ready to define the van Emde Boas (vEB) decomposition. 
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Figure 1: Example of the ART-decomposition and a leftmost top path. In the top, the nodes forming 
the top tree are drawn. In the bottom, triangles representing the bottom trees with a number that is 
the size of the bottom tree, v’s leftmost top path is shown as well, and the two trees hanging to the left 
of this path li and I 2 . 


The van Emde Boas Decomposition We define the van Emde Boas Decomposition of a tree 
T as follows. The van Emde Boas (vEB) decomposition of T is obtained by recursively applying an 
ART decomposition: Let v = root{T) and x = \fN. If = 0(1), stop. Otherwise, construct an 
ART decomposition of T{v) with parameter x. For each bottom tree T{u) recursively construct a vEB 
decomposition with v = u and x = y/x. 

Define the level of a node u in T as level(u) = [log log A^ — loglogA^„J (this corresponds to the depth 
of the recursion when v is included in its top tree). 

Note that except for the nodes on the lowest level—which are not in any top tree—all nodes belong 
to exactly one top tree. For any node v £ T not in the last level, let Ttop{v) be the top tree v belongs 
to. The leftmost top path of v is the path from v to the leftmost leaf of Ttop{v). See Figure]^ 

Intuitively, the vEB decomposition of T defines a nested hierarchy of subtrees that decrease by at 
least the square root of the size at each step. 

The van Emde Boas Decomposition of Grammars Our definition of the vEB decomposition of 
trees can be extended to SLPs as follows. Since the vEB decomposition is based only on the length of 
the string Ny generated by each node v, the definition of the vEB decomposition is also well-defined on 
SLPs. As in the tree, all nodes belong to at most one top DAG. We can therefore reuse the terminology 
from the definition for trees on SLPs as well. 

To compute the vEB decomposition first determine the level of each node and then remove all edges 
between nodes on different levels. This can be done in 0(n) time. 

3.2 Data Structure 

We first present a data structure that achieves 0((loglog A^)^ -l-log(f)) time. In the next section we then 
show how to improve the running time to the desired 0(loglog(A^) -I- log(i)) bound. 

Our data structure contains the following information for each node v G S. Let li,l 2 , ■ ■ ■ ,lk be the nodes 
hanging to the left of u’s leftmost top path (excluding nodes hanging from the bottom node). 

• The length Ny of S{v). 

• The sum of the sizes of nodes hanging to the left of v’s leftmost top path Sy = |?i| -b I/ 2 I + ■ ■ ■ + |4|- 

• A pointer by to the bottom node on v’s leftmost top path. 

• A predecessor data structure over the sequence 1, |Zi| -b 1, |^i| + I/ 2 I + 1, • ■ •, \h \ + 1- We will 
later show how to represent this data structure. 

In addition we also build the data structure from Lemma that given any node v supports random 
access to S{v) in 0(logAft,) time using 0{n) space. 
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To perform an access query we proceed as follows. Suppose that we have reached some node v and 
we want to compute We consider the following five cases (when multiple cases apply take the 

first): 

1. If Ny = 0(1). Decompress S{v) and return the i’th character. 

2. If z < s„. Find the predecessor p of z in v’s predecessor structure and let u be the corresponding 
node. Recursively find S{u)[i — p]. 

3. If z < + Nieft(b^)- Recursively find S(left{hy))[i — s„]. 

4. If z < Sj, + Recursively find S[right{by))[i — s„ — ./V;eyqb„)]- 

5. In all other cases, perform a random access for z in S{v) using Lemma 

To see correctness, first note that case (1) and (5) are correct by definition. Case (2) is correct since 
when i < Sy we know the z’th leaf must be in one of the trees hanging to the left of the leftmost top 
path, and the predecessor query ensures we recurse into the correct one of these bottom trees. In case 
(3) and (4) we check if the z’th leaf is either in the left or right subtree of by and if it is, we recurse into 
the correct one of these. 


Compact Predecessor Data Structures We now describe how to represent the predecessor data 
structure. Simply storing a predecessor structure in every single node would use O(zz^) space. We can 
reduce the space to 0(n) using ideas similar to the construction of the ’’heavy path suffix forest” in [lO] . 

Let L denote the leftmost top path forest. The nodes of L are the nodes of S. A node u is the 
parent of z; in L iff zz is a child of z; in 5 and u is on z;’s leftmost top path. Thus, a leftmost top path 
z)i,..., zzfe in 5 is a sequence of ancestors from vi in L. The weight of an edge (zz, v) in L is 0 if zz is a 
left child of zi in iS and otherwise Ni^f^yy Several leftmost top paths in S can share the same suffix, but 
the leftmost top path of a node in S is uniquely defined and thus L is a forest. A leftmost path ends 
in a leaf in the top DAG, and therefore L consists of 0{n) trees each rooted at a unique leaf of a top 
dag. A predecessor query on the sequence 1, |Zi| + 1, |Zi| + \l 2 \ + 1,..., \h\ + 1 now corresponds to 

a weighted ancestor query in L. We plug in the weighted ancestor data structure from Farach-Colton 
and Muthukrishnan 18 , which supports weighted ancestor queries in a forest in 0(log log n + log log U)) 
time with 0{n) preprocessing and space, where U is the maximum weight of a root-to-leaf path and n 
the number of leaves. We have U = N and hence the time for queries becomes O(loglogiV). 


Space and Preprocessing Time For each node in S we store a constant number of values, which 
takes 0{n) space. Both the predecessor data structure and the data structure for supporting random 
access from Lemma take 0(n) space, so the overall space usage is 0{n). The vEB decomposition 
can be computed in 0(n) time. The leftmost top paths and the information saved in each node can be 
computed in linear time. The predecessor data structure uses linear preprocessing time, and thus the 
total preprocessing time is 0{n). 

Query Time Consider each case of the recursion. The time for case (1), (3) and (4) is trivially 
0(1). Case (2) is O(loglogiV) since we perform exactly one predececssor query in the predecessor data 
structure. 

In case (5) we make a random access query in a node of size Ny. From Lemma we have that 
the query time is 0{logNy). We know level(z;) = level(&«) since they are on the same leftmost top 
path. From the definition of the level it follows for any pair of nodes zz and w with the same level 
that Ny > \/Nw and thus > y/Nf. From the conditions we have i > Sy + Ni,.^ > Ni,^ > y/Tfy. 
Since ^/Nf < i ^ ^og Ny < 21ogz we have logiV^ = O(logz) and thus the running time for case (5) is 
0{logNy) = O(logz). 

Case (1) and (5) terminate the algorithm and can thus not happen more than once. Case (2), (3) 
and (4) are repeated at most O(loglogfV) times since the level of the node we recurse on increments by 
at least one in each recursive call, and the level of a node is at most O (log log A^). The overall running 
time is therefore 0((logloglV)^ + logz). 

In summary, we have the following result. 
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Lemma 5 Let S be an SLP of size n representing a string of length N. Using 0(n) space, we can 
support access to position i of any node v, in time Oflogi + (loglog 


3.3 Improving the Query Time for Small Indices 

The above algorithm obtains the running time O(logi) for i > ^ improve the 

running time to 0(loglogiV + logi) by improving the running time in the case when i < . 

In addition to the data structure from above, we add another copy of the data structure with a few 
changes. When answering a query, we first check if i > . If i > we use the original 

data structure, otherwise we use the new copy. 

The new copy of the data structure is implemented as follows. In the first level of the ART- 
decomposition let x = Qf y(/v. For the rest of the levels use ^/x as before. Fur¬ 

thermore, we split the resulting new leftmost top path forest L into two disjoint parts: Li consisting 
of all nodes with level 1 and L >2 consisting of all nodes with level at least 2. For Li we use the 
weighted ancestor data structure by Farach-Colton and Muthukrishnan 18 as in the previous section 


using 0(loglogn-|-loglog A^) = 0{log\ogN) time. However, if we apply this solution for L >2 we end up 
with a query time of 0(loglogn -|- log log cc), which does not lead to an improved solution. Instead, we 
present a new data structure that supports queries in 0(logloga:) time. 

Lemma 6 Given a tree T with n leaves where the sum of edge weights on any root-to-leaf path is at 
most X and the height is at most x, we can support weighted ancestor queries in 0(logloga;) time using 
0{n) space and preprocessing time. 

Proof. Create an ART-decomposition of T with parameter x. For each bottom tree in the decomposition 


construct the weighted ancestor structure from 18 . For the top tree, construct a predecessor structure 


over the accumulated edge weights for each root-to-leaf path. 

To perform a weighted ancestor query on a node in a bottom tree, we first perform a weighted ancestor 
query using the data structure for the bottom tree. In case we end up in the root of the bottom tree, 
we continue with a predecessor search in the top tree from the leaf corresponding to the bottom tree. 

The total space for bottom trees is 0(n). Since the top tree has 0(n/x) leaves and height at most x, 
the total space for all predecessor data structures on root-to-leaf paths in the top tree is 0(n/x-x) = 0(n). 
Hence, the total space is 0(n). 

A predecessor query in the top tree takes O (log log x) time. The number of nodes in each bottom 
tree is at most since it has at most x leaves and height x and the maximum weight of a root-to-leaf 
path is X giving weighted ancestor queries in 0(loglogx^ -I- log log x) = O(loglogx) time. Hence, the 
total query time is O (log log x). □ 


We reduce the query time for queries with i < 2 *^*°siogV) data structure. The level 

of any node in the new structure is at most 0(1 -I- loglog ) = 0(logloglog A^). A weighted 

ancestor query in Li takes time O(log log A^). For weighted ancestor queries in L> 2 , we know any node v 
has height at most 2 *^*°s*°s v) root-to-leaf path the sum of the weights is at most . 

Hence, by Lemma we support queries in 0(loglog2*^^°s^°s^^ ) = 0(logloglog A^) time for nodes in 
L>2. 

We make at most one weighted ancestor query in Li, the remaining ones are made in L> 2 , and thus 
the overall running time is 0(loglog A^ -I- (log log log A^)^ + logi) = 0(loglog A^ -I- logi). 

In summary, this completes the proof of Lemma 


4 Static Finger Search 

We now show how to apply our solution to the fringe access to a obtain a simple data structure for the 
static finger search problem. This solution will be the starting point for solving the dynamic case in the 
next section, and we will use it as a key component in our result for longest common extension problem. 

Similar to the fringe search problem we assume without loss of generality that the access point i is 
to the right of the finger. 
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Figure 2: Illustration of the data structure for a finger pointing at / and an access query at location i. 
hi, /i 2 , /13 are the heavy paths visited when finding the finger, u corresponds to NCA(vf, Vi) in the parse 
tree and hg is the heavy path on which u lies, which we use to find u. a is a value calculated during the 
access query. 


Data Structure We store the random access data structure from 10 used in Lemmaj^and the fringe 
search data structures from above. Also from we store the data structure that for any heavy path h 
starting in a node v and an index i of a leaf in T{v) gives the exit-node from h when searching for i in 
O(loglogA) time and uses 0{n) space. 

To represent a finger the key idea is store a compact data structure for the corresponding root-to- 
leaf path in the grammar that allows us to navigate it efficiently. Specifically, let / be the position 
of the current finger and let p = vi .. .Vk denote the path in S from the root to Vf (ui = root and 
Vk = Vf). Decompose p into the O(logfV) heavy paths it intersects, and call these hj = vi.. .Vi^,hj-i = 
Uij+i .. .Vi^, - ■ ■ ,hi = ■. - Vk- Let v(hi) be the topmost node on hi (vihj) = vi,v{hj-i) = Vi^,...). 

Let Ij be the index of / in S{v{hj)) and — Ij- For the finger we store: 


1. The sequence ri, r 2 ,..., (note ri < r 2 < ■ ■ ■ < Vj). 

2. The sequence v{hi),v(h 2 ), • ■ •, v{hj). 

3. The string Ft = S[f + I, f F log N], 


Analysis The random access and fringe search data structures both require 0{n) space. Each of the 3 
bullets above require 0(log A) space and thus the finger takes up 0(log A) space. The total space usage 
is 0{n). 


Setfinger We implement setfinger(/) as follows. First, we apply Lemma to make random access to 
position /. This gives us the sequence of visited heavy paths which exactly corresponds to hj, hj-i,... ,hi 
including the corresponding li values from which we can calculate the values. So we update the 
sequence accordingly. Finally, decompress and save the string Ft = S[f + l,f + log A]. 

The random access to position / takes O(logA) time. In addition to this we perform a constant 
number of operations for each heavy path hi, which in total takes 0(log A) time. Decompressing a string 
of log A characters can be done in 0(log A) time (using [^). In total, we use 0(log A) time. 


Access To perform access(i) {i > /), there are two cases. It D = i — f < log A we simply return the 
stored character Ft[D] in constant time. Otherwise, we compute the node u = nca(u/,Ui) in the parse 
tree T as follows. First find the index s of the successor to D in the sequence using binary search. Now 
we know that u is on the heavy path hg. Find the exit-nodes from hg when searching for respectively i 
and / using the data structure from 10 - the topmost of these two is u. See Fig. Finally, we compute 
a as the index of / in T(left{u)) from the right and use the data structure for fringe search from Lemmaj^ 
to compute S{right(u))[i — f — a]. 

For D < log A, the operation takes constant time. For D > log A, the binary search over a sequence 
of O(logA) elements takes O(loglogA) time, finding the exit-nodes takes O(loglogA) time, and the 
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fringe search takes 0(log(z — / — a)) = 0{\ogD) time. Hence, in total 0(loglog + logZ?) = 0{logD) 
time. 

This completes the proof of Theorem [^i). 


5 Dynamic Finger Search 

In this section we show how to extend the solution from Section]^ to handle dynamic finger search. The 
target is to support the movefinger operation that will move the current finger, where the time it takes is 
dependent on how far the finger is moved. Obviously, it should be faster than simply using the setfinger 
operation. The key difference from the static finger is a new decomposition of a root-to-leaf path into 
paths. The new decomposition is based on a combination of heavy paths and leftmost top paths, which 
we will show first. Then we show how to change the data structure to use this decomposition, and how 
to modify the operations accordingly. Finally, we consider how to generalize the solution to work when 
movefinger/access might both be to the left and right of the current finger, which for this solution is not 
trivially just by symmetry. 

Before we start, let us see why the data structure for the static finger cannot directly be used for 
dynamic finger. Suppose we have a finger pointing at / described by 0(logiV) heavy paths. It might be 
the case that after a movefinger(/ + 1) operation, it is 0(logfV) completely different heavy paths that 
describes the finger. In this case we must do 0(log A^) work to keep our finger data structure updated. 
This can for instance happen when the current finger is pointing at the right-most leaf in the left subtree 
of the root. 

Furthermore, in the solution to the static problem, we store the substring S[f + l,f + log A^] decom¬ 
pressed in our data structure. If we perform a movefinger(/ -|- log N) operation nothing of this substring 
can be reused. To decompress logA^ characters takes H(logAf) time, thus we cannot do this in the 
movefinger operation and still get something faster than 0(log A^). 


5.1 Left Heavy Path Decomposition of a Path 

Let p = vi.. .Vk be a root-to-leaf path in S. A subpath pi = Va ■■■ Vb oi p is a, maximal heavy subpath if 
Va ■ ■ - Vb is part of a heavy path and Vb+i is not on the same heavy path. Similarly, a subpath pi = Va ■ ■ - Vb 
of p is a maximal leftmost top subpath iiva---Vb is part of a leftmost top path and level{vb) yf level{vb+i). 

A left heavy path decomposition is a decomposition of a root-to-leaf path p into an arbitrary sequence 
Pi ■ ■ - Pj of maximal heavy subpaths, maximal leftmost top subpaths and (non-maximal) leftmost top 
subpaths immediately followed by maximal heavy subpaths. 

Define v{pi) as the topmost node on the subpath pi. Let Ij be the index of the finger / in S{v{pj)) 
and Vj = Ny(^p.') — Ij. Let t{pi) be the type of pp, either heavy subpath (HP) or leftmost top subpath 
(LTP). 

A left heavy path decomposition of a root-to-leaf path p is not unique. The heavy path decomposition 
of p is always a valid left heavy path decomposition as well. The visited heavy paths and leftmost top 
paths during fringe search are always maximal and thus is always a valid left heavy path decomposition. 

Lemma 7 The number of paths in a left heavy path decomposition is OiiogN). 

Proof. There are at most OflogN) heavy paths that intersects with a root-to-leaf path (Lemma [^. 
Each of these can at most be used once because of the maximality. So there can at most be OfiogN) 
maximal heavy paths. Each time there is a maximal leftmost top path, the level of the following node 
on p increases. This can happen at most OfioglogN) times. Each non-maximal leftmost top path is 
followed by a maximal heavy path, and since there are only 0{\ogN) of these, this can happen at most 
0{\ogN) times. Therefore the sequence of paths has length OflogN + log log A^ -I- log A^) = OflogN). 


5.2 Data Structure 


We use the data structures from 10 as in the static variant and the fringe access data structure with 


an extension. In the fringe access data structure there is a predecessor data structure for all the nodes 
hanging to the left of a leftmost top path. To support access and movefinger we need to find a node 
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hanging to the left or right of a leftmost top path. We can do this by storing an identical predecessor 
structure for the accumulated sizes of the nodes hanging to the right of each leftmost top path. Again, 
the space usage for this predecessor structure can be reduced to 0(n) by turning it into a weighted 
ancestor problem. 

To represent a finger the idea is again to have a compact data structure representing the root-to-leaf 
path corresponding to the finger. This time we will base it on a left heavy path decomposition instead 
of a heavy path decomposition. Let / be the current position of the finger. For the root-to-leaf path to 
Vf we maintain a left heavy path decomposition, and store the following for a finger: 

1. The sequence ri,r 2 , ■ ■ ■ ,rj (ri < r 2 < • • • < r^) on a stack with the last element on top. 

2. The sequence v{pi),v{p 2 ), ■ ■ ■, v{pj) on a stack with the last element on top. 

3. The sequence t(pi), t{p 2 ), ■ ■ ■, t{pj) on a stack with the last element on top. 

Analysis The fringe access data structure takes up 0{n) space. For each path in the left heavy path 
decomposition we use constant space. Using Lemmaj^we have the space usage of this is 0(log N) = 0{n). 

Setfinger Use fringe access (Lemmato access position /. This gives us a sequence of leftmost top 
paths and heavy paths visited during the fringe access which is a valid left heavy path decomposition. 
Calculate for each of these and store the three sequences of r^, v{pi) and t{pi) on stacks. 

The fringe access takes 0(log/ -I- log log A) time. The number of subpaths visited during the fringe 
access cannot be more than 0(log/ -I- log log A) and we only perform constant extra work for each of 
these. 

Access To implement access(f) for f > / we have to find u = nca(ui,u/) in the T. Find the index s of 
the successor to D = i — f in ri,r 2 , ■■■ ,rj using binary search. We know nca(vi,Vf) lies on and Vi is 
in a subtree that hangs of Ps ■ The exit-nodes from ps to Vf and Vi are now found - the topmost of these 
two is nca(ui, Vf). If t{ps) = HP then we can use the same data structure as in the static case, otherwise 
we perform the predecessor query on the extra predecessor data structure for the nodes hanging of the 
leftmost top path. Finally, we compute a as the index of / in S{lejt(u)) from the right and use the data 
structure for fringe access from Lemmato compute S{right{u))[i — f — a]. 

The binary search on ri,r 2 , ■ ■ ■ ,rj takes O(loglogA) time. Finding the exit-nodes from ps takes 
0(loglog A) in either case. Finally the fringe access takes 0(log(i — f — a) + log log A) = 0{logD + 
log log A). Overall it takes 0(log D + log log A). 

Note the extra 0(loglog A) time usage because we have not decompressed the first log A characters 
following the finger. 

Movefinger To move the finger we combine the access and setfinger operations. Find the index s of 
the successor to D = i — f in ri,r 2 ,... ,rj using binary search. Now we know u = nca(ui,u/) must lie 
on Ps- Find u in the same way as when performing access. From all of the stacks pop all elements above 
index s. Compute a as the index of / in S{left{u)) from the right. The finger should be moved to index 
z — / — a in right(u). First look at the heavy path right{u) lies on and find the proper exit-node w using 
the data structure from [10| . Then continue with fringe searh from the proper child of w. This gives a 
heavy path followed by a sequence of maximal leftmost top paths and heavy paths needed to reach Vi 
from right(u), push the Vj, v(pj), and t{pj) values for these on top of the respective stacks. 

We now verify the sequence of paths we maintain is still a valid left heavy path decomposition. Since 
fringe search gives a sequence of paths that is a valid left heavy path decomposition, the only problem 
might be Ps is no longer maximal. If Ps is a heavy path it will still be maximal, but if Ps is a leftmost 
top path then level(u) and level{right{u)) might be equal. But this possibly non-maximal leftmost top 
path is always followed by a heavy path. Thus the overall sequence of paths remains a left heavy path 
decomposition. 

The successor query in ri,r 2 , ■ ■ ■ ,rj takes 0(loglog A) time. Finding u onpi takes 0(loglog A) time, 
and so does finding the exit-node on the following heavy path. Popping a number of elements from the 
top of the stacks can be done in 0(1) time. Finally the fringe access takes 0(log(z — f — a) + log log A) = 
0{\ogD + log log n) including pushing the right elements on the stacks. Overall the running time is 
therefore 0{\ogD + log log n). 
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5.3 Moving/Access to the Left of the Finger 

In the above we have assumed i > /, we will now show how this assumption can be removed. It is 
easy to see we can mirror all data structures and we will have a solution that works for i < f instead. 
Unfortunately, we cannot just use a copy of each independently, since one of them only supports moving 
the finger to the left and the other only supports moving to the right. We would like to support moving 
the finger left and right arbitrarily. This was not a problem with the static finger since we could just 
make setfinger in both the mirrored and non-mirrored data structures in O(logiV) time. 

Instead we extend our finger data structure. First we extend the left heavy path decomposition to 
a left right heavy path decomposition by adding another type of paths to it, namely rightmost top paths 
(the mirrorred version of leftmost top paths). Thus a left right heavy path decomposition is a decompo¬ 
sition of a root-to-leaf path p into an arbitrary sequence pi ■ ■ - Pj of maximal heavy subpaths, maximal 
leftmost/rightmost top subpaths and (non-maximal) left most/right most top subpaths immediately fol¬ 
lowed by maximal heavy subpaths. Now t{pi) = HP\LTP\RTP. Furthermore, we save the sequence 
li,l 2 , ■ ■ ■, Ij ilj being the left index of / in T(v{pi))) on a stack like the ri, r 2 ,..., Vj values, etc. 

When we do access and movefinger where i < f, the subpath ps where nca{vf,Vi) lies can be found 
by binary search on the Ij values instead of the rj values. Note the Ij values are sorted on the stack, just 
like the Vj values. The following heavy path lookup/fringe access should now be performed on left(u) 
instead of right(u). The remaining operations can just be performed in the same way as before. 


6 Finger Search with Fingerprints and Longest Common Ex¬ 
tensions 

We show how to extend our finger search data structure from Theorem [^i) to support computing 
fingerprints and then apply the result to compute longest common extensions. First, we will show how 
to return a fingerprint for 5 '(w)[1,j] when performing access on the fringe of v. 

6.1 Fast Fingerprints on the Fringe 

To do this, we need to store some additional data for each node v G S. We store the fingerprint (j){S{v)) 
and the concatenation of the fingerprints of the nodes hanging to the left of the leftmost top path 
Pv = (j){S{li)) 0 (j){S{l 2 )) © ... © (j){S{lk))- We also need the following lemma: 

Lemma 8 (E) Let S be a string of length N compressed into a SLP S of size n. Given a node v G S, 
we can find the fingerprint 0(5'(u)[l, *]) where 1 < * < Ny in OflogNy) time. 

Suppose we are in a node v and we want to calculate the fingerprint (/(S'(r’)[I, i]). We perform an 
access query as before, but also maintain a fingerprint p, initially p = ifie), computed thus far. We follow 
the same five cases as before, but add the following to update p: 

1. From the decompressed S{v), calculate the fingerprint for S')?;))!, ?], now updatep = p©^(S(?;)[l, i]). 

2. p = p © ((/(p„) Qs 4>{Pu))- 

3. p = p©(/(p„). 

4. p = p © (/(p„) © 4>{S{left{by))). 

5. Use Lemmaj^to find the fingerprint for S(?;)[l, i] and then update with p = p © (/)(S(?;)[I, *]). 

These extra operations do not change the running time of the algorithm, so we can now find the 
fingerprint (/(S(?;)[l,i]) in time 0(loglogIV + log(min(?,IVi, — ?))). 
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6.2 Finger Search with Fingerprints 

Next we show how to do finger search while computing fingerprints between the finger / and the access 
point i. 

When we perform setfinger(/) we use the algorithm from to compute fingerprints during the search 
of S from the root to /. This allows us to subsequently compute for any heavy path hj on the root to 
position / the fingerprint p{hj) of the concatenation of the strings generated by the subtrees hanging 
to the left of hj. In addition, we explicitly compute and store the fingerprints / + 1]), / + 

2]),..., / + logN + 1]). In total, this takes 0{logN) time. 

Suppose that we have now performed a setfinger(/) operation. To implement access(i), i > f, there 
are two cases, li D = i — f < logN we return the appropriate precomputed fingerprint. Otherwise, we 
compute the node u = ncz{vf,Vi) in the parse tree T as before. Let h be the heavy path containing u. 
Using the data structure from we compute the fingerprint pi of the nodes hanging to the left of h above 
u in constant time. The fingerprint is now obtained as i]) = ph^ (Bpi®4)(S(right(u))[l, (i —/) —a]), 

where the latter is found using fringe access with fingerprints in right(u). None of these additions change 
the asymptotic complexities of Theorem [^i). Note that with the fingerprint construction in we can 
guarantee that all fingerprints are collision-free. 

6.3 Longest Common Extensions 

Using the fingerprints it is now straightforward to implement Ice queries as in [^. Given a \ce{i,j) 
query, first set fingers at positions i and j. This allows us to get fingerprints of the form + a]) 

or (j){S[j,j + a]) efficiently. Then, we find the largest value £ such that (j){S[i,i + P\) = j + £]) 

using a standard exponential search. Setting the two finger uses O(logiV) time and by Theorem[^i) the 
at most Oilogt) searches in the exponential search take at most 0(log.G time. Hence, in total we use 
0{logN + log^ £) time, as desired. This completes the proof of Theorem M 
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